Efficient Polynomial Evaluation Algorithm and Implementation on FPGA

نویسندگان

  • Simin Xu
  • Wei Ting Loke
چکیده

In this thesis, an optimized polynomial evaluation algorithm is presented. Compared to Horner’s Rule which has the least number of computation steps but longest latency, or parallel evaluation methods like Estrin’s method which are fast but with large hardware overhead, the proposed algorithm could achieve high level of parallelism with smallest area, by means of replacing multiplication with sqaure. To enable the performance gain for the proposed algorithm, an efficient integer squarer is proposed and implemented in FPGA with fewer DSP blocks. Previous work has presented tiling method for a double precision squarer which uses the least amount of DSP blocks so far. However it incurs a large LUT overhead and has a complex and irregular structure that it is not expandable for higher word size. The circuit proposed in this thesis can reduce the DSP block usage by an equivalent amount compared to the tiling method while incurring a much lower LUT overhead: 21.8% fewer LUTs for a 53-bit squarer. The circuit is mapped to Xilinx Virtex 6 FPGA and evaluated for a wide range of operand word sizes, demonstrating its scalability and efficiency. With the novel squarer, the proposed polynomial algorithm exhibits 41% latency reduction over conventional Horner’s Rule for a 5 degree polynomial with 11.9% less area and 44.8% latency reduction in a 4 degree polynomial with 5% less area on FPGA. In contrast, Estrin’s method occupies 26% and 16.5% more area compared to Horner’s Rule to achieve same level of speed improvement for the same 5 and 4 degree polynomial respectively.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient implementation of low time complexity and pipelined bit-parallel polynomial basis multiplier over binary finite fields

This paper presents two efficient implementations of fast and pipelined bit-parallel polynomial basis multipliers over GF (2m) by irreducible pentanomials and trinomials. The architecture of the first multiplier is based on a parallel and independent computation of powers of the polynomial variable. In the second structure only even powers of the polynomial variable are used. The par...

متن کامل

Fixed-point FPGA Implementation of a Kalman Filter for Range and Velocity Estimation of Moving Targets

Tracking filters are extensively used within object tracking systems in order to provide consecutive smooth estimations of position and velocity of the object with minimum error. Namely, Kalman filter and its numerous variants are widely known as simple yet effective linear tracking filters in many diverse applications. In this paper, an effective method is proposed for designing and implementa...

متن کامل

Implementation of VlSI Based Image Compression Approach on Reconfigurable Computing System - A Survey

Image data require huge amounts of disk space and large bandwidths for transmission. Hence, imagecompression is necessary to reduce the amount of data required to represent a digital image. Thereforean efficient technique for image compression is highly pushed to demand. Although, lots of compressiontechniques are available, but the technique which is faster, memory efficient and simple, surely...

متن کامل

Polynomial transform based DCT implementation

Discrete Cosine Transform (DCT) is an important transform of particular interest in still image compression and compression of individual video frames, while multidimensional DCT is mostly used for compression of video streams and volume spaces. An FPGA implementation of a Polynomial Transform DCT (PTDCT) algorithm, recently proposed by Zeng et al. [10], is presented. The regularity of Zeng’s a...

متن کامل

FPGA Implementation of Polynomial Evaluation Algorithms

The most-significant-digit-first function evaluation method (E-method) allows efficient evaluation of polynomials and certain rational functions on custom hardware. The time required for the computation is of the order of m carry-free addition operations, m being the number of digits in the result. We discuss a digit-parallel and a digit-serial implementation of this method on a DecPeRLe-1 boar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013